Overview:
The Pandas framework makes the data processing easier by making one simple assumption. That is, the majority of the data which gets processed by manmade systems in current times are one-dimensional or two-dimensional. There could be data being produced that are higher in dimensions or data needing abstraction of higher dimensions. But the two abstractions that are commonly used are one-dimensional data and two-dimensional data.
By having the classes Series and DataFrame the pandas Framework gives versatile and strong capabilities to process these one-dimensional and two-dimensional data. When in it comes to the volume of data, these one-dimensional and two-dimensional data could be of huge sizes.
To support such huge volumes of data, the series and the DataFrame classes have been built with numpy's ndarray as their underlying data structure. A developer is free to get access to the underlying ndarray, through the DataFrame attribute values or through the method to_numpy().
Example:
# Example Python program that prints the underlying ndarray # Binary data # Make a DataFrame from binary data # Get the underlying ndarray # Know the buffer start location of this ndarray # Check whether to_numpy() just returned a view print("Memory layout information:"); |
Outlook:
The underlying numpy array: [[0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 1] [1 1 0 0 0 0 1 1] [1 1 1 0 0 1 1 1] [1 1 1 1 1 1 1 1] [1 1 1 0 0 1 1 1] [1 1 0 0 0 0 1 1] [0 0 0 0 0 0 0 0]] <class 'numpy.ndarray'> The buffer location:<memory at 0x10f0551e0> The buffer location in the original DataFrame:<memory at 0x10f0551e0> Memory layout information: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False |